# MoTrPAC R packages
library(MotrpacRatTraining6mo) # v1.5.0
# MotrpacRatTraining6moData v1.8.0 is automatically attached
# for plotting
library(ggplot2)
knitr::opts_chunk$set(echo = TRUE)These package provides functions to fetch, explore, and reproduce the processed data and downstream analysis results presented in the main paper for the first large-scale multi-omic multi-tissue endurance exercise training study conducted in young adult rats by the Molecular Transducers of Physical Activity Consortium (MoTrPAC). Find the preprint on bioRxiv.
While functions in MotrpacRatTraining6moData can be used by themselves, they were primarily written to analyze data in the MotrpacRatTraining6moData R package.
Note: Data objects from the MotrpacRatTraining6moData
package are indicated as variable names in all caps,
e.g. PHENO, TRNSCRPT_LIVER_RAW_COUNTS.
Tip: To learn more about any data object or
function, use ? to retrieve the documentation, e.g.,
?METAB_FEATURE_ID_MAP, ?load_sample_data. Note
that both MotrpacRatTraining6mo and
MotrpacRatTraining6moData must be installed and loaded with
library() for this to work.
MoTrPAC is a national research consortium designed to discover and perform preliminary characterization of the range of molecular transducers (the “molecular map”) that underlie the effects of physical activity in humans. The program’s goal is to study the molecular changes that occur during and after exercise and ultimately to advance the understanding of how physical activity improves and preserves health. The six-year program is the largest targeted NIH investment of funds into the mechanisms of how physical activity improves health and prevents disease. See motrpac.org and motrpac-data.org for more details.
Details of the experimental design can be found in the supplementary methods of the bioRxiv preprint. Briefly, 6-month-old young adult rats were subjected to progressive endurance exercise training for 1, 2, 4, or 8 weeks, with tissues collected 48 hours after the last training bout. Sex-matched sedentary, untrained rats were used as controls. Whole blood, plasma, and 18 solid tissues were analyzed using genomics, proteomics, metabolomics, and protein immunoassay technologies, with most assays performed in a subset of these tissues. Depending on the assay, between 3 and 6 replicates per sex per time point were analyzed.
It is important to be aware of the tissue and assay abbreviations because they are used to name data objects and define arguments for many functions.
The vectors of abbreviations are also available in
TISSUE_ABBREV and ASSAY_ABBREV:
TISSUE_ABBREV## [1] "ADRNL" "BAT" "BLOOD" "COLON" "CORTEX" "HEART" "HIPPOC" "HYPOTH"
## [9] "KIDNEY" "LIVER" "LUNG" "OVARY" "PLASMA" "SKM-GN" "SKM-VL" "SMLINT"
## [17] "SPLEEN" "TESTES" "VENACV" "WAT-SC"
ASSAY_ABBREV## [1] "ACETYL" "ATAC" "IMMUNO" "METAB" "METHYL" "PHOSPHO" "PROT"
## [8] "TRNSCRPT" "UBIQ"
Let’s see how we can use these R packages to identify an interesting shared multi-omic signal in the heart (HEART) and gastrocnemius (SKM-GN) muscle tissues.
First, let’s look at a set of functions used to easily load data from
the MotrpacRatTraining6moData R package.
Use load_sample_data() to load sample-level data for a
specific ome and tissue.
?load_sample_data
# heart protein expression
heart_prot = load_sample_data("HEART", "PROT")## PROT_HEART_NORM_DATA
# heart immunoassay data
heart_immuno = load_sample_data("HEART", "IMMUNO")## IMMUNO HEART normalized data from IMMUNO_NORM_DATA_FLAT
# gastroc gene expression
gastroc_rna = load_sample_data("SKM-GN", "TRNSCRPT")## TRNSCRPT_SKMGN_NORM_DATA
# gastroc metabolomics
gastroc_metab = load_sample_data("SKM-GN", "METAB")## METAB SKM-GN normalized data from METAB_NORM_DATA_FLAT
Let’s look at the format of these data frames.
heart_prot[1:8,1:5]## feature feature_ID tissue assay 90237015805
## 1 <NA> XP_017456475.1 HEART PROT 0.00769
## 2 <NA> XP_017447817.1 HEART PROT 0.09326
## 3 <NA> NP_446341.1 HEART PROT -0.11469
## 4 <NA> NP_071796.2 HEART PROT 0.05903
## 5 <NA> NP_001157776.1 HEART PROT 1.24616
## 6 PROT;HEART;XP_003754359.1 XP_003754359.1 HEART PROT -0.04140
## 7 <NA> XP_017459445.1 HEART PROT -0.03179
## 8 <NA> XP_017453129.1 HEART PROT -0.12237
The feature column is non-NA only for training-regulated
features, which provides a convenient way of identifying this subset of
features.
Sometimes it is more convenient to have all-numeric data frames. We
can easily convert these data frames to this format with
df_to_numeric().
df_to_numeric(heart_prot, rownames = "feature_ID")[1:5,1:5]## 90237015805 90245015805 90441015805 90420015805 90289015805
## XP_017456475.1 0.00769 -0.05952 0.10992 -0.00229 0.04674
## XP_017447817.1 0.09326 -0.08879 0.31763 0.07739 -0.06031
## NP_446341.1 -0.11469 0.10623 0.08138 0.08464 -0.16039
## NP_071796.2 0.05903 -0.06490 0.05673 -0.07494 0.06263
## NP_001157776.1 1.24616 0.63626 0.15487 -0.39231 -0.64953
Note that load_sample_data() also prints a message with
the name of the object in MotrpacRatTraining6moData that is
being returned. Therefore, instead of using
load_sample_data(), we could call the object directly.
PROT_HEART_NORM_DATA[1:8,1:5]## feature feature_ID tissue assay 90237015805
## 1 <NA> XP_017456475.1 HEART PROT 0.00769
## 2 <NA> XP_017447817.1 HEART PROT 0.09326
## 3 <NA> NP_446341.1 HEART PROT -0.11469
## 4 <NA> NP_071796.2 HEART PROT 0.05903
## 5 <NA> NP_001157776.1 HEART PROT 1.24616
## 6 PROT;HEART;XP_003754359.1 XP_003754359.1 HEART PROT -0.04140
## 7 <NA> XP_017459445.1 HEART PROT -0.03179
## 8 <NA> XP_017453129.1 HEART PROT -0.12237
Let’s plot the sample-level data for a single differential feature.
head(heart_prot$feature[!is.na(heart_prot$feature)])## [1] "PROT;HEART;XP_003754359.1" "PROT;HEART;NP_058935.2"
## [3] "PROT;HEART;XP_006252013.1" "PROT;HEART;XP_017452487.1"
## [5] "PROT;HEART;XP_001058477.1" "PROT;HEART;XP_017444239.1"
plot_feature_normalized_data(feature="PROT;HEART;XP_003754359.1",
add_gene_symbol = TRUE)
plot_feature_normalized_data(feature="PROT;HEART;XP_003754359.1",
add_gene_symbol = TRUE,
facet_by_sex = TRUE)We can also use combine_normalized_data() to load
normalized data from multiple omes or tissues simultaneously. For
example, let’s load all non-epigenetic data from the gastroc.
all_gastroc = combine_normalized_data(tissues="SKM-GN", include_epigen = FALSE)## Warning in combine_normalized_data(tissues = "SKM-GN", include_epigen = FALSE):
## 'include_epigen' is FALSE. Excluding ATAC and METHYL data.
## IMMUNO SKM-GN normalized data from IMMUNO_NORM_DATA_FLAT
## METAB SKM-GN normalized data from METAB_NORM_DATA_FLAT
## PHOSPHO_SKMGN_NORM_DATA
## PROT_SKMGN_NORM_DATA
## TRNSCRPT_SKMGN_NORM_DATA
all_gastroc[1:5,1:8]## feature feature_ID tissue assay dataset 10023259
## 1 <NA> ADIPONECTIN SKM-GN IMMUNO ADIPONECTIN 15.129967
## 2 <NA> PAI-1/SERPIN-E SKM-GN IMMUNO SERPIN-E 5.727920
## 3 IMMUNO;SKM-GN;GCSF GCSF SKM-GN IMMUNO rat-mag27plex 4.087463
## 4 <NA> EOTAXIN SKM-GN IMMUNO rat-mag27plex 5.475733
## 5 <NA> GMCSF SKM-GN IMMUNO rat-mag27plex 4.426265
## 10024735 10025626
## 1 15.150025 15.124909
## 2 5.491853 5.321928
## 3 4.044394 4.000000
## 4 5.087463 5.375039
## 5 4.247928 4.807355
# how many features?
nrow(all_gastroc)## [1] 50712
# how many differential features?
nrow(all_gastroc[!is.na(all_gastroc$feature),])## [1] 2296
# what omes?
table(all_gastroc$assay[!is.na(all_gastroc$feature)])##
## IMMUNO METAB PHOSPHO PROT TRNSCRPT
## 7 338 694 691 566
Note that column names are now PIDs instead of vial labels. This is because measurements from multiple vial labels (samples) from the same PID (animal) are included in the same column.
Next, let’s load the differential analysis results. Recall there are
two types of differential analysis results: timewise and
training. We use the adjusted p-value from the
training differential analysis to identify training-regulated
features (selection_fdr); we use the summary statistics
from the timewise differential analysis to quantify time- and
sex- specific training effects. The results below are the
timewise results and adjusted p-values from
the training results. Lesser-used additional training
summary statistics are available through
load_training_da().
First, let’s consider training-regulated features only. These are
conveniently provided in TRAINING_REGULATED_FEATURES, which
we can easily subset to any data set of interest.
head(TRAINING_REGULATED_FEATURES)## feature assay assay_code tissue tissue_code
## 1 ACETYL;HEART;NP_001003673.1_K477k ACETYL prot-ac HEART t58-heart
## 2 ACETYL;HEART;NP_001003673.1_K477k ACETYL prot-ac HEART t58-heart
## 3 ACETYL;HEART;NP_001003673.1_K477k ACETYL prot-ac HEART t58-heart
## 4 ACETYL;HEART;NP_001003673.1_K477k ACETYL prot-ac HEART t58-heart
## 5 ACETYL;HEART;NP_001003673.1_K477k ACETYL prot-ac HEART t58-heart
## 6 ACETYL;HEART;NP_001003673.1_K477k ACETYL prot-ac HEART t58-heart
## feature_ID non_redundant_feature_ID platform sex training_group
## 1 NP_001003673.1_K477k <NA> <NA> female 1w
## 2 NP_001003673.1_K477k <NA> <NA> female 2w
## 3 NP_001003673.1_K477k <NA> <NA> female 4w
## 4 NP_001003673.1_K477k <NA> <NA> female 8w
## 5 NP_001003673.1_K477k <NA> <NA> male 1w
## 6 NP_001003673.1_K477k <NA> <NA> male 2w
## timewise_logFC timewise_logFC_se timewise_p_value timewise_zscore
## 1 -0.02737365 0.07346830 0.71228171 -0.37259
## 2 -0.01935973 0.07346830 0.79410231 -0.26351
## 3 -0.05102628 0.07346830 0.49311823 -0.69453
## 4 -0.16677605 0.07346830 0.03116679 -2.27004
## 5 0.03933023 0.08168714 0.63486235 0.48147
## 6 0.10815561 0.06773140 0.12435304 1.59683
## meta_reg_het_p meta_reg_pvalue training_p_value training_q
## 1 NA NA 0.002814029 0.0275727
## 2 NA NA 0.002814029 0.0275727
## 3 NA NA 0.002814029 0.0275727
## 4 NA NA 0.002814029 0.0275727
## 5 NA NA 0.002814029 0.0275727
## 6 NA NA 0.002814029 0.0275727
heart_gastroc_sig = TRAINING_REGULATED_FEATURES[TRAINING_REGULATED_FEATURES$tissue %in% c("HEART","SKM-GN"),]
# how many differential features?
nrow(heart_gastroc_sig)## [1] 47176
# what tissues and omes?
table(heart_gastroc_sig$tissue, heart_gastroc_sig$assay)##
## ACETYL ATAC IMMUNO METAB METHYL PHOSPHO PROT TRNSCRPT UBIQ
## HEART 3600 600 40 4544 856 3488 5544 5776 192
## SKM-GN 0 3536 56 2384 952 5552 5528 4528 0
We can also load the full set of differential analysis results. This consumes some memory, so let’s skip this for now.
# for a single tissue
heart_da = combine_da_results(tissues="HEART")
# for multiple tissues
heart_gastroc_da = combine_da_results(tissues=c("SKM-GN","HEART"))Plot the differential analysis results for a single training-regulated feature.
# plot the first differential feature
head(heart_gastroc_sig$feature)## [1] "ACETYL;HEART;NP_001003673.1_K477k" "ACETYL;HEART;NP_001003673.1_K477k"
## [3] "ACETYL;HEART;NP_001003673.1_K477k" "ACETYL;HEART;NP_001003673.1_K477k"
## [5] "ACETYL;HEART;NP_001003673.1_K477k" "ACETYL;HEART;NP_001003673.1_K477k"
plot_feature_logfc(feature = "ACETYL;HEART;NP_001003673.1_K477k", facet_by_sex = TRUE, add_gene_symbol = TRUE)
# how does this compare to the sample-level data?
plot_feature_normalized_data(feature = "ACETYL;HEART;NP_001003673.1_K477k", facet_by_sex = TRUE, add_gene_symbol = TRUE)Across all tissues and omes, we had over 30,000 training-regulated features. We used a graphical approach to identify groups of features with similar temporal dynamics. See more details in the “Get Started” tutorial.
The graph representation of the repfdr results are
provided in GRAPH_COMPONENTS (node and edge lists) and
GRAPH_STATES (data frame representation). Features are
specified in the format “[ASSAY_ABBREV];[TISSUE_ABBREV];[feature_ID]”,
e.g. “PHOSPHO;SKM-GN;NP_001006973.1_S34s”,
“METAB;SKM-GN;1-methylnicotinamide”.
Let’s start exploring the heart and gastroc graphical clustering results features by plotting the graphs themselves.
Start by plotting graphs for gastroc and heart separately.
# gastroc
# get_tree_plot_for_tissue(tissues=c("SKM-GN"))
get_tree_plot_for_tissue(tissues=c("SKM-GN"),
omes=c("PROT","PHOSPHO","TRNSCRPT"),
max_trajectories = 3,
parallel_edges_by_ome = TRUE,
curvature = 0.5,
edge_alpha_range = c(0.5, 1))## Using "sugiyama" as default layout
## Warning: Ignoring `graph` as layout is already calculated
## ℹ Pass the calculated layout to the `graph` argument to silence this warning
# heart
# get_tree_plot_for_tissue(tissues=c("HEART"),
# max_trajectories = 3,
# parallel_edges_by_ome = TRUE,
# curvature = 1,
# edge_alpha_range = c(0.5, 1))
get_tree_plot_for_tissue(tissues=c("HEART"),
omes=c("PROT","PHOSPHO","TRNSCRPT","METAB","ACETYL"),
max_trajectories = 3,
parallel_edges_by_ome = TRUE,
curvature = 0.8,
edge_alpha_range = c(0.5, 1))## Using "sugiyama" as default layout
## Warning: Ignoring `graph` as layout is already calculated
## ℹ Pass the calculated layout to the `graph` argument to silence this warning
Now plot both heart and gastroc features together. For the sake of visualization, let’s limit it to popular omes in both tissues.
# edges by tissue
get_tree_plot_for_tissue(tissues=c("SKM-GN","HEART"),
omes=c("PROT","PHOSPHO","TRNSCRPT","METAB"),
max_trajectories = 3,
parallel_edges_by_tissue = TRUE,
curvature = 0.5,
edge_alpha_range = c(0.5, 1))## Using "sugiyama" as default layout
## Warning: Ignoring `graph` as layout is already calculated
## ℹ Pass the calculated layout to the `graph` argument to silence this warning
# edges by ome
get_tree_plot_for_tissue(tissues=c("SKM-GN","HEART"),
omes=c("PROT","PHOSPHO","TRNSCRPT","METAB"),
max_trajectories = 3,
parallel_edges_by_ome = TRUE,
curvature = 0.8,
edge_alpha_range = c(0.5, 1))## Using "sugiyama" as default layout
## Warning: Ignoring `graph` as layout is already calculated
## ℹ Pass the calculated layout to the `graph` argument to silence this warning
There’s a lot going on in each of these three trajectories. We will focus on these three paths in the heart and gastroc for the rest of this workshop.
Let’s determine the features that belong to each of these paths. This can be done one of a few ways, but I’ll show two here.
First, how can we get the names of these paths without actually
having to type it all out? Since we’re looking at the three
largest paths, we can use
extract_tissue_sets().
sets = extract_tissue_sets(tissues=c("HEART","SKM-GN"), k = 3, add_week8 = FALSE)
# select paths only
names(sets)## [1] "8w_F1_M1"
## [2] "4w_F1_M1"
## [3] "8w_F-1_M-1"
## [4] "4w_F1_M1---8w_F1_M1"
## [5] "1w_F0_M0---2w_F0_M0"
## [6] "2w_F1_M1---4w_F1_M1"
## [7] "1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1"
## [8] "1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1"
## [9] "1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1"
sets = sets[grepl("->",names(sets))]
names(sets)## [1] "1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1"
## [2] "1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1"
## [3] "1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1"
lapply(sets, head)## $`1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1`
## [1] "ACETYL;HEART;NP_001004220.1_K116k" "ACETYL;HEART;NP_001006973.1_K401k"
## [3] "ACETYL;HEART;NP_001007621.1_K354k" "ACETYL;HEART;NP_001014183.1_K193k"
## [5] "ACETYL;HEART;NP_001014183.1_K199k" "ACETYL;HEART;NP_001030123.1_K581k"
##
## $`1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1`
## [1] "ACETYL;HEART;NP_001020795.1_K31k" "ACETYL;HEART;NP_001162612.1_K438k"
## [3] "ACETYL;HEART;NP_001188284.1_K17k" "ACETYL;HEART;NP_001316822.1_K4k"
## [5] "ACETYL;HEART;NP_037320.1_K811k" "ACETYL;HEART;NP_058771.2_K240k"
##
## $`1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1`
## [1] "ACETYL;HEART;NP_001005550.1_K709k" "ACETYL;HEART;NP_001094198.1_K204k"
## [3] "ACETYL;HEART;NP_037023.1_K276k" "ACETYL;HEART;NP_059002.2_K247k"
## [5] "ACETYL;HEART;NP_071579.1_K78k" "ACETYL;HEART;NP_071579.1_K94k"
Note that this function does not separate paths by tissue when constructing the lists of features.
If we weren’t looking at the top N paths, we could construct the name
of the each path and subset GRAPH_STATES. In addition to
subsetting the path column as shown here, we can use the
state_*w columns to select features in any node, edge, or
path.
head(GRAPH_STATES)## feature ome
## ACETYL;HEART;NP_001003673.1_K477k ACETYL;HEART;NP_001003673.1_K477k ACETYL
## ACETYL;HEART;NP_001004072.2_K77k ACETYL;HEART;NP_001004072.2_K77k ACETYL
## ACETYL;HEART;NP_001004072.2_K83k ACETYL;HEART;NP_001004072.2_K83k ACETYL
## ACETYL;HEART;NP_001004085.1_K186k ACETYL;HEART;NP_001004085.1_K186k ACETYL
## ACETYL;HEART;NP_001004085.1_K261k ACETYL;HEART;NP_001004085.1_K261k ACETYL
## ACETYL;HEART;NP_001004085.1_K268k ACETYL;HEART;NP_001004085.1_K268k ACETYL
## tissue feature_ID state_1w state_2w
## ACETYL;HEART;NP_001003673.1_K477k HEART NP_001003673.1_K477k F0_M0 F0_M0
## ACETYL;HEART;NP_001004072.2_K77k HEART NP_001004072.2_K77k F0_M0 F0_M0
## ACETYL;HEART;NP_001004072.2_K83k HEART NP_001004072.2_K83k F-1_M-1 F-1_M-1
## ACETYL;HEART;NP_001004085.1_K186k HEART NP_001004085.1_K186k F0_M1 F0_M1
## ACETYL;HEART;NP_001004085.1_K261k HEART NP_001004085.1_K261k F0_M1 F0_M1
## ACETYL;HEART;NP_001004085.1_K268k HEART NP_001004085.1_K268k F0_M1 F0_M1
## state_4w state_8w
## ACETYL;HEART;NP_001003673.1_K477k F0_M0 F-1_M-1
## ACETYL;HEART;NP_001004072.2_K77k F1_M0 F1_M0
## ACETYL;HEART;NP_001004072.2_K83k F0_M-1 F0_M0
## ACETYL;HEART;NP_001004085.1_K186k F0_M0 F0_M-1
## ACETYL;HEART;NP_001004085.1_K261k F0_M0 F0_M-1
## ACETYL;HEART;NP_001004085.1_K268k F0_M0 F0_M-1
## path
## ACETYL;HEART;NP_001003673.1_K477k 1w_F0_M0->2w_F0_M0->4w_F0_M0->8w_F-1_M-1
## ACETYL;HEART;NP_001004072.2_K77k 1w_F0_M0->2w_F0_M0->4w_F1_M0->8w_F1_M0
## ACETYL;HEART;NP_001004072.2_K83k 1w_F-1_M-1->2w_F-1_M-1->4w_F0_M-1->8w_F0_M0
## ACETYL;HEART;NP_001004085.1_K186k 1w_F0_M1->2w_F0_M1->4w_F0_M0->8w_F0_M-1
## ACETYL;HEART;NP_001004085.1_K261k 1w_F0_M1->2w_F0_M1->4w_F0_M0->8w_F0_M-1
## ACETYL;HEART;NP_001004085.1_K268k 1w_F0_M1->2w_F0_M1->4w_F0_M0->8w_F0_M-1
## tissue_path
## ACETYL;HEART;NP_001003673.1_K477k HEART:1w_F0_M0->2w_F0_M0->4w_F0_M0->8w_F-1_M-1
## ACETYL;HEART;NP_001004072.2_K77k HEART:1w_F0_M0->2w_F0_M0->4w_F1_M0->8w_F1_M0
## ACETYL;HEART;NP_001004072.2_K83k HEART:1w_F-1_M-1->2w_F-1_M-1->4w_F0_M-1->8w_F0_M0
## ACETYL;HEART;NP_001004085.1_K186k HEART:1w_F0_M1->2w_F0_M1->4w_F0_M0->8w_F0_M-1
## ACETYL;HEART;NP_001004085.1_K261k HEART:1w_F0_M1->2w_F0_M1->4w_F0_M0->8w_F0_M-1
## ACETYL;HEART;NP_001004085.1_K268k HEART:1w_F0_M1->2w_F0_M1->4w_F0_M0->8w_F0_M-1
clusters = list()
#paths = names(sets)
paths = c("1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1", "1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1", "1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1")
tissues = c("SKM-GN", "HEART")
for (path in paths){
for(t in tissues){
name = paste0(t, ":", path)
clusters[[name]] = as.vector(na.omit(GRAPH_STATES$feature[GRAPH_STATES$path == path &
GRAPH_STATES$tissue == t]))
}
}
lapply(clusters, length)## $`SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1`
## [1] 249
##
## $`HEART:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1`
## [1] 357
##
## $`SKM-GN:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1`
## [1] 229
##
## $`HEART:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1`
## [1] 177
##
## $`SKM-GN:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1`
## [1] 107
##
## $`HEART:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1`
## [1] 138
Let’s move forward with this list of features per cluster. We’ll rename them to make them easier to keep track of.
MY_CLUSTERS = clustersWe can also use the check_cluster_res_format() function
to transform the list to an annotated data frame.
MY_CLUSTER_DF = check_cluster_res_format(clusters)
head(MY_CLUSTER_DF)## feature
## 1 ATAC;SKM-GN;chr15:101579022-101579761
## 2 ATAC;SKM-GN;chr1:221194280-221195125
## 3 ATAC;SKM-GN;chr1:221199295-221200124
## 4 ATAC;SKM-GN;chr20:34579127-34579734
## 5 ATAC;SKM-GN;chr5:114013062-114014681
## 6 ATAC;SKM-GN;chrX:119151643-119154560
## cluster ome tissue
## 1 SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 ATAC SKM-GN
## 2 SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 ATAC SKM-GN
## 3 SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 ATAC SKM-GN
## 4 SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 ATAC SKM-GN
## 5 SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 ATAC SKM-GN
## 6 SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 ATAC SKM-GN
## feature_ID
## 1 chr15:101579022-101579761
## 2 chr1:221194280-221195125
## 3 chr1:221199295-221200124
## 4 chr20:34579127-34579734
## 5 chr5:114013062-114014681
## 6 chrX:119151643-119154560
We can visualize the composition of these clusters in terms of tissues and omes.
plot_features_per_cluster(MY_CLUSTERS) # could also use MY_CLUSTER_DF as inputWe can also look at the trajectories of these features’ abundances over the training time course. Let’s look at a single cluster.
plot_feature_trajectories(MY_CLUSTERS$`SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1`,
title="SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1")One important tool we use to interpret the biology underlying graphical clusters is pathway enrichment analysis. In this section, we look at functions to explore the existing pathway enrichment results presented in the manuscript.
To perform your own pathway enrichment analysis, either with MoTrPAC data from this study or your own data, see
cluster_pathway_enrichment(),gene_pathway_enrichment(), andcustom_cluster_pathway_enrichment().
For the manuscript, we performed pathway enrichment for the 2 largest
nodes, 2 largest edges, 10 largest non-null paths, and all 8-week nodes
from the graphical representation of training-regulated features in each
tissue. This set of graphical clusters can be extracted using
extract_main_clusters(). Pathway enrichment results were
adjusted over all tests using IHW with tissue as
a covariate. These pathway enrichment results are available in
GRAPH_PW_ENRICH.
Because our 3 paths of interest are among the 10 largest paths in
both tissues, the pathway enrichment results for these features are
already available in GRAPH_PW_ENRICH. Let’s take a
look.
head(GRAPH_PW_ENRICH)## query significant term_size query_size intersection_size precision
## 1 query_1 TRUE 63 65 18 0.2769231
## 2 query_1 TRUE 23 65 12 0.1846154
## 3 query_1 TRUE 24 65 9 0.1384615
## 4 query_1 TRUE 25 65 9 0.1384615
## 5 query_1 TRUE 34 65 10 0.1538462
## 6 query_1 TRUE 23 65 8 0.1230769
## recall term_id source term_name
## 1 0.2857143 KEGG:01200 KEGG Carbon metabolism
## 2 0.5217391 KEGG:00020 KEGG Citrate cycle (TCA cycle)
## 3 0.3750000 KEGG:00640 KEGG Propanoate metabolism
## 4 0.3600000 KEGG:01212 KEGG Fatty acid metabolism
## 5 0.2941176 KEGG:00280 KEGG Valine, leucine and isoleucine degradation
## 6 0.3478261 KEGG:00071 KEGG Fatty acid degradation
## effective_domain_size source_order parents
## 1 1394 184 KEGG:00000
## 2 1394 3 KEGG:00000
## 3 1394 111 KEGG:00000
## 4 1394 186 KEGG:00000
## 5 1394 32 KEGG:00000
## 6 1394 11 KEGG:00000
## evidence_codes
## 1 KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG
## 2 KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG
## 3 KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG
## 4 KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG
## 5 KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG
## 6 KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG
## intersection
## 1 ENSRNOG00000003653,ENSRNOG00000015020,ENSRNOG00000008103,ENSRNOG00000011329,ENSRNOG00000008755,ENSRNOG00000001177,ENSRNOG00000024128,ENSRNOG00000009994,ENSRNOG00000010277,ENSRNOG00000018522,ENSRNOG00000023520,ENSRNOG00000007895,ENSRNOG00000006364,ENSRNOG00000028557,ENSRNOG00000017481,ENSRNOG00000013949,ENSRNOG00000005686,ENSRNOG00000050843
## 2 ENSRNOG00000003653,ENSRNOG00000015020,ENSRNOG00000008103,ENSRNOG00000024128,ENSRNOG00000009994,ENSRNOG00000010277,ENSRNOG00000023520,ENSRNOG00000007895,ENSRNOG00000006364,ENSRNOG00000017481,ENSRNOG00000013949,ENSRNOG00000005686
## 3 ENSRNOG00000015029,ENSRNOG00000008755,ENSRNOG00000001177,ENSRNOG00000018522,ENSRNOG00000006364,ENSRNOG00000028557,ENSRNOG00000017481,ENSRNOG00000005686,ENSRNOG00000050843
## 4 ENSRNOG00000018114,ENSRNOG00000011413,ENSRNOG00000008755,ENSRNOG00000001177,ENSRNOG00000015840,ENSRNOG00000010697,ENSRNOG00000018522,ENSRNOG00000013766,ENSRNOG00000010800
## 5 ENSRNOG00000015029,ENSRNOG00000008063,ENSRNOG00000001177,ENSRNOG00000010697,ENSRNOG00000018522,ENSRNOG00000013766,ENSRNOG00000010800,ENSRNOG00000006364,ENSRNOG00000028557,ENSRNOG00000050843
## 6 ENSRNOG00000018114,ENSRNOG00000008843,ENSRNOG00000008755,ENSRNOG00000001177,ENSRNOG00000010697,ENSRNOG00000018522,ENSRNOG00000013766,ENSRNOG00000010800
## gost_adj_p_value computed_p_value cluster tissue ome kegg_id
## 1 2.237958e-09 5.204554e-11 HEART:8w_F1_M1 HEART ACETYL <NA>
## 2 2.237958e-09 3.414274e-11 HEART:8w_F1_M1 HEART ACETYL <NA>
## 3 9.644681e-06 4.485898e-07 HEART:8w_F1_M1 HEART ACETYL <NA>
## 4 1.161548e-05 6.753184e-07 HEART:8w_F1_M1 HEART ACETYL <NA>
## 5 1.875400e-05 1.308419e-06 HEART:8w_F1_M1 HEART ACETYL <NA>
## 6 5.014476e-05 4.081550e-06 HEART:8w_F1_M1 HEART ACETYL <NA>
## adj_p_value graphical_cluster
## 1 2.101458e-08 8w_F1_M1
## 2 8.894370e-09 8w_F1_M1
## 3 4.472744e-05 8w_F1_M1
## 4 1.054033e-04 8w_F1_M1
## 5 1.259332e-04 8w_F1_M1
## 6 3.424679e-04 8w_F1_M1
writeLines(colnames(GRAPH_PW_ENRICH))## query
## significant
## term_size
## query_size
## intersection_size
## precision
## recall
## term_id
## source
## term_name
## effective_domain_size
## source_order
## parents
## evidence_codes
## intersection
## gost_adj_p_value
## computed_p_value
## cluster
## tissue
## ome
## kegg_id
## adj_p_value
## graphical_cluster
The cluster column in GRAPH_PW_ENRICH
prepends the cluster name with the tissue abbreviation. The
graphical_cluster column does not. Let’s get all
significant heart and gastroc enrichment results for these three
paths.
enrich_res = GRAPH_PW_ENRICH[GRAPH_PW_ENRICH$cluster %in% names(MY_CLUSTERS) & GRAPH_PW_ENRICH$adj_p_value < 0.1,]
table(enrich_res$cluster, enrich_res$ome)##
## ACETYL ATAC METAB
## HEART:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1 2 2 0
## HEART:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1 7 0 1
## HEART:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 16 0 2
## SKM-GN:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1 0 0 0
## SKM-GN:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1 0 0 0
## SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 0 2 0
##
## METHYL PHOSPHO PROT
## HEART:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1 7 0 2
## HEART:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1 0 9 3
## HEART:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 0 29 30
## SKM-GN:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1 0 0 1
## SKM-GN:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1 0 0 1
## SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 0 0 28
##
## TRNSCRPT
## HEART:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1 20
## HEART:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1 20
## HEART:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 11
## SKM-GN:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1 5
## SKM-GN:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1 10
## SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1 15
Okay, so now we know how pathway enrichment was performed. Great! Now let’s get to my favorite part - how to visualize large numbers of significant pathway enrichments.
After performing pathway enrichment for our graphical clusters of interest, we observed that many clusters had many dozens of significantly enriched pathways. In order to make these results more digestible, we leveraged visNetwork to construct interactive networks of pathway enrichments. These networks are conceptually similar to those made by the EnrichmentMap Cytoscape module, but it’s easier to generate them and explore the underlying data.
Briefly, each node in the network is a pathway, and edges connect highly similar pathways, i.e., those whose enrichments are driven by highly similar sets of genes across omes. This groups together highly similar pathway enrichments, which makes it easier to digest large numbers of results.
In our implementation, groups of similar pathway enrichments are color-coded, and group labels are inferred from the term names and parents of the pathways in the group. Larger nodes indicate that the pathway was significantly enriched in multiple datasets; thicker edges indicate higher similarity between the pathways.
Best of all, these networks are interactive! Use your cursor to zoom and drag, and hover over nodes and edges to see more information about the underlying data, namely the genes and datasets driving the pathway enrichment.
Note that you could run this function with any pathway enrichment
results, as long as the columns of the input include “adj_p_value”,
“ome”, “tissue”, “intersection”, “computed_p_value”, “term_name”, and
“term_id” and the columns of feature_to_gene include
“gene_symbol”, “ensembl_gene”. feature$ensembl_gene can be
a dummy column if intersection_id_type = "gene_symbol".
Let’s focus on a single trajectory, first separately in each tissue. While it is possible to programatically render vizNetwork objects, it is not trivial, so let’s keep it simple here.
names(MY_CLUSTERS)## [1] "SKM-GN:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1"
## [2] "HEART:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1"
## [3] "SKM-GN:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1"
## [4] "HEART:1w_F-1_M-1->2w_F-1_M-1->4w_F-1_M-1->8w_F-1_M-1"
## [5] "SKM-GN:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1"
## [6] "HEART:1w_F0_M1->2w_F0_M1->4w_F1_M1->8w_F1_M1"
enrichment_network_vis(tissues = "SKM-GN",
cluster = "1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1",
add_group_label_nodes = TRUE)## Formatting inputs...
## Subsetting feature-to-gene map...
## Calculating similarity metric between pairs of enriched pathways...
## Constructing graph...
## Elapsed time: 15.391s
enrichment_network_vis(tissues = "HEART",
cluster = "1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1",
title = "HEART:1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1",
add_group_label_nodes = TRUE)## Formatting inputs...
## Subsetting feature-to-gene map...
## Calculating similarity metric between pairs of enriched pathways...
## Constructing graph...
## Elapsed time: 2.707s
Now let’s examine pathways that are enriched in the same graphical
cluster in multiple tissues (heart and gastroc in our case). If we are
specifically interested in pathways that are significantly enriched in
multiple tissues, we can set multitissue_pathways_only to
TRUE. This removes any pathways that are not enriched in
more than one tissue (in at least one ome).
enrichment_network_vis(tissues = c("HEART","SKM-GN"),
cluster = "1w_F1_M1->2w_F1_M1->4w_F1_M1->8w_F1_M1",
multitissue_pathways_only = TRUE,
add_group_label_nodes = TRUE,
title = "'All up' path in heart and gastroc - shared enrichments")## Formatting inputs...
## Subsetting feature-to-gene map...
## Calculating similarity metric between pairs of enriched pathways...
## Constructing graph...
## Elapsed time: 2.868s
sessionInfo()## R version 4.2.2 (2022-10-31)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.4.0 MotrpacRatTraining6mo_1.5.0
## [3] MotrpacRatTraining6moData_1.8.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 numDeriv_2016.8-1.1 tools_4.2.2
## [4] bslib_0.4.2 utf8_1.2.2 R6_2.5.1
## [7] DBI_1.1.3 BiocGenerics_0.44.0 colorspace_2.0-3
## [10] sn_2.1.0 withr_2.5.0 tidyselect_1.2.0
## [13] gridExtra_2.3 mnormt_2.1.1 compiler_4.2.2
## [16] cli_3.6.0 Biobase_2.58.0 TFisher_0.2.0
## [19] sandwich_3.0-2 labeling_0.4.2 sass_0.4.4
## [22] scales_1.2.1 mvtnorm_1.1-3 r2r_0.1.1
## [25] stringr_1.5.0 digest_0.6.31 rmarkdown_2.19
## [28] pkgconfig_2.0.3 htmltools_0.5.4 plotrix_3.8-2
## [31] fastmap_1.1.0 limma_3.54.0 highr_0.10
## [34] htmlwidgets_1.6.1 rlang_1.0.6 rstudioapi_0.14
## [37] visNetwork_2.1.2 jquerylib_0.1.4 farver_2.1.1
## [40] generics_0.1.3 zoo_1.8-11 jsonlite_1.8.4
## [43] dplyr_1.0.10 magrittr_2.0.3 Matrix_1.5-3
## [46] Rcpp_1.0.9 munsell_0.5.0 fansi_1.0.3
## [49] viridis_0.6.2 lifecycle_1.0.3 stringi_1.7.12
## [52] multcomp_1.4-20 yaml_2.3.6 edgeR_3.40.1
## [55] ggraph_2.1.0 mathjaxr_1.6-0 MASS_7.3-58.1
## [58] grid_4.2.2 ggrepel_0.9.2 lattice_0.20-45
## [61] graphlayouts_0.8.4 splines_4.2.2 multtest_2.54.0
## [64] locfit_1.5-9.7 qqconf_1.3.1 knitr_1.41
## [67] pillar_1.8.1 igraph_1.3.5 codetools_0.2-18
## [70] stats4_4.2.2 mutoss_0.1-12 glue_1.6.2
## [73] evaluate_0.19 metap_1.8 data.table_1.14.6
## [76] vctrs_0.5.1 tweenr_2.0.2 Rdpack_2.4
## [79] gtable_0.3.1 purrr_1.0.1 polyclip_1.10-4
## [82] tidyr_1.2.1 assertthat_0.2.1 cachem_1.0.6
## [85] xfun_0.36 ggforce_0.4.1 rbibutils_2.2.13
## [88] tidygraph_1.2.2 survival_3.5-0 viridisLite_0.4.1
## [91] tibble_3.1.8 ellipsis_0.3.2 TH.data_1.1-1